From SweSum to ScandSum- Automatic text summarization for the Scandinavian languages

نویسنده

  • Hercules Dalianis
چکیده

In automatic text summarization, the most relevant parts of a document are extracted and put together in a non-redundant summary that is shorter than the original document. A more advanced form of summarization is multi-text summarization where several texts are condensed into one summary. As the amount of information on the Internet grows abundantly, it is difficult to select relevant information. Automatic text summarization is extremely useful in combination with a search engine on the Web. Automatic text summarization can automatize this work completely or at least assist in the process. In particular, automatic text summarization can be used to prepare information for use in small mobile devices, which may need considerable reduction of content. The techniques used in automatic summarization have interesting spin-off effects in the area of advanced search engine technologies in form of query expansion, such as stemming, the use of thesauri and spell checking of the query. Current Scandinavian summarization tools. SweSum is an automatic text summarizer for Swedish (SweSum 2002) developed at KTH, (see Figure 1). We have currently in this network developed the first version of Danish summarizer. In the commercial area the Norwegian company Cognit AS (Cognit 2002) has a summarizer called Corporum summarizer available for Norwegian, Swedish, German and English. Among the Norwegian language resources that are being reused and, the following are especially mentioned: (a) a word form lexicon with explicit relations between variants in five different subnorms of Bokm l, developed in the European project SCARRIE aimed at spelling and grammar correction in Scandinavian languages, (b) a part of speech tagger developed jointly by the Humanities Information Technology centre at Bergen and Tekstlaboratoriet in Oslo. The summarizer is currently written in Perl. Evaluation Evaluation is an important task in automatic text summarization. Although systems, like summarizers, are currently still dependent on frequency calculations on shallow analyzed texts in order to approximate the relevance of discourse entities, a switch from a stemmer to a lemmatizer will clearly permit to considerably improve their overall performance. Rapid development in mobile communication has enabled the distribution of both textual and multimedia information to various kinds of mobile devices. There are numerous research projects on information services for mobile users as well as (commercial) services (e.g. Plucker or iSilo) that provide offline information for mobile devices. Mobile users may have different information needs depending on their social and environmental contexts as well as their personal interests. Summarization techniques obviously have a key role in this context. Increased pressure for summarization technology advances is coming from mobile users of the web, on-line information sources and new mobile devices, as well as from the need for corporate knowledge management. Commercial companies are increasingly starting to offer text summarization capabilities, often bundled with information retrieval tools. Thus, text summarization for distribution to mobile platforms can be considered a major area of interest within Nordic language technology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

To search and summarize in Scandinavia

Automatic text summarization is the method where a computer summarizes a text. A text is given to the computer and it returns a non-redundant shorter text. Text summarization can be used to summarize news in the Business Intelligence domain, automatically edit news in the news paper setting domain and summarize news down to a length suitable for SMS and WAP but also to summarize news before the...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Development of a Swedish Corpus for Evaluating Summarizers and other IR-tools

We are presenting the construction of a Swedish corpus aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization, we will also present the results on evaluating our Swedish text summarizer SweSum with this corpus. The corpus has been constructed by using Internet agents downloading Swedish newspaper text from various sources. A sma...

متن کامل

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

An Optimization Text Summarization Method Based on Naïve Bayes and Topic Word for Single Syllable Language

Text summarization since the late 50’s of the 20th century by the simple technical based on term frequency and it applied for technical text summarization at IBM institute. During more than 50 years of development, text summarization is still a hot topic that attracting many researchers, scholars in the field of data mining and natural language processing proposals development of the text summa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003